Skip to content

fix(bg): detach --web-bg child from Windows job to prevent kill-on-close#200

Merged
jrob5756 merged 1 commit into
mainfrom
fix/195-web-bg-windows-job-breakaway
May 18, 2026
Merged

fix(bg): detach --web-bg child from Windows job to prevent kill-on-close#200
jrob5756 merged 1 commit into
mainfrom
fix/195-web-bg-windows-job-breakaway

Conversation

@jrob5756
Copy link
Copy Markdown
Collaborator

Closes #195.

Problem

conductor run --web-bg and conductor resume --web-bg fork a detached
child via subprocess.Popen with only CREATE_NEW_PROCESS_GROUP on
Windows. When launched from a shell wrapper that runs commands inside a
Windows job object with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE (GitHub
Actions runners, VS Code integrated terminal, JetBrains IDE terminals,
GitHub Copilot CLI shell tool), the child inherits the parent's job and
is killed within ~10 seconds when the parent exits and the job tears
down. Users see a Dashboard: http://... line but the workflow never
makes progress and the PID file is orphaned.

Fix

Add CREATE_BREAKAWAY_FROM_JOB to the Windows creationflags so the
child fully detaches from the parent's job object.

Fallback for hardened CI

Some hardened CI environments clear JOB_OBJECT_LIMIT_BREAKAWAY_OK on
their job, which makes CreateProcess raise ERROR_ACCESS_DENIED. In
that case we:

  1. Emit a visible stderr warning so the user understands --web-bg
    may not survive shell exit in this environment and should run from a
    non-job-managed shell.
  2. Retry without CREATE_BREAKAWAY_FROM_JOB so the launch still works
    in environments where the job is not KILL_ON_JOB_CLOSE.

The fallback is not silent — a silent fallback would make --web-bg
look successful (URL printed, server briefly reachable) while the child
still dies, which is exactly the user-visible failure mode of #195.

Only OSError with winerror == 5 triggers the fallback. Other
OSErrors (e.g. missing executable / FileNotFoundError) propagate
unchanged so the existing RuntimeError("Failed to start background process: ...") wrapper still surfaces them cleanly.

Refactor

The two near-identical detachment + Popen blocks in launch_background
and launch_background_resume are consolidated into a single
_spawn_detached(cmd, env) helper backed by _detachment_kwargs() and
_is_breakaway_denied(). This also enforces parity between the run and
resume paths going forward.

Module-level _CREATE_NEW_PROCESS_GROUP and _CREATE_BREAKAWAY_FROM_JOB
constants are resolved via getattr(subprocess, ..., default) (same
pattern as cli/update.py) so the module remains importable on POSIX
hosts and tests can patch sys.platform to "win32" from Linux/macOS.

Tests

New tests/test_cli/test_bg_runner.py (no kwarg-level coverage of
launch_background existed previously — only launch_background_resume):

  • _detachment_kwargs returns start_new_session=True on POSIX and
    CREATE_NEW_PROCESS_GROUP | CREATE_BREAKAWAY_FROM_JOB on Windows.
  • _is_breakaway_denied narrowly classifies winerror == 5 only —
    other winerrors and missing winerror attribute return False.
  • _spawn_detached happy paths on POSIX and Windows.
  • _spawn_detached breakaway-denied path: first Popen raises
    OSError(winerror=5), second Popen called WITHOUT
    CREATE_BREAKAWAY_FROM_JOB, stderr warning emitted, stdio + env
    preserved across the retry.
  • _spawn_detached non-breakaway OSError (e.g. winerror=2)
    propagates from the first Popen without a retry.
  • _spawn_detached POSIX OSError never retries.
  • Both launch_background and launch_background_resume route through
    _spawn_detached (asserted via patch + call inspection) — so the fix
    applies to both run and resume paths.

tests/test_cli/test_resume_command.py::test_subprocess_detachment_kwargs
updated to expect the OR'd flag combo on Windows.

Validation

  • make lint
  • make typecheck ✅ (1 pre-existing warning in dialog_evaluator.py,
    unrelated to this change)
  • make test ✅ — 2659 passed, 12 skipped in ~8 min on macOS
  • The Windows-only assertion is exercised in CI on the Windows runner.

Notes

  • No public API change. launch_background and
    launch_background_resume keep the same signatures and return types.
  • No docs change needed — the AGENTS.md "Run / Resume Parity" section
    already covers --web-bg parity, which this fix preserves.
  • The decision to fail loud rather than silently fall back was made
    after a rubber-duck review flagged the silent-fallback risk.

@jrob5756 jrob5756 force-pushed the fix/195-web-bg-windows-job-breakaway branch from 61eeccc to b75fe0d Compare May 18, 2026 17:46
When `conductor run --web-bg` or `conductor resume --web-bg` is launched
from a shell wrapper that runs commands inside a Windows job object with
JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE (GitHub Actions runners, VS Code
integrated terminal, JetBrains IDE terminals, GitHub Copilot CLI shell
tool), the detached child inherits the parent's job and is killed within
~10 seconds when the parent exits and the job tears down. Users see a
dashboard URL but the workflow never makes progress.

The Popen call previously requested only CREATE_NEW_PROCESS_GROUP. This
adds CREATE_BREAKAWAY_FROM_JOB so the child fully detaches from the
parent's job object.

Some hardened CI environments clear JOB_OBJECT_LIMIT_BREAKAWAY_OK on
their job, in which case CreateProcess raises ERROR_ACCESS_DENIED. In
that case, emit a visible stderr warning (so the user understands bg
mode may not survive shell exit and should run from a non-job-managed
shell) and retry without the breakaway flag. Other OSErrors (e.g.
missing executable) propagate unchanged so the existing RuntimeError
wrapper still surfaces them cleanly.

Refactors the two near-identical detachment+Popen blocks in
launch_background and launch_background_resume into a single
_spawn_detached helper backed by _detachment_kwargs and
_is_breakaway_denied. Adds module-level _CREATE_NEW_PROCESS_GROUP and
_CREATE_BREAKAWAY_FROM_JOB constants resolved via getattr so this
module remains importable on POSIX hosts and tests can patch
sys.platform to "win32" from Linux/macOS.

Tests cover:
- _detachment_kwargs returns start_new_session on POSIX and OR'd
  creationflags on Windows.
- _is_breakaway_denied narrowly classifies winerror == 5 only.
- _spawn_detached happy paths (POSIX, Windows).
- _spawn_detached breakaway-denied fallback emits stderr warning,
  retries without CREATE_BREAKAWAY_FROM_JOB, preserves stdio + env.
- Non-breakaway OSError propagates without retry.
- launch_background and launch_background_resume both route through
  _spawn_detached so the fix applies to run and resume paths alike.

The existing test_subprocess_detachment_kwargs assertion is updated to
expect the OR'd flag combo on Windows.

Closes #195

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jrob5756 jrob5756 force-pushed the fix/195-web-bg-windows-job-breakaway branch from b75fe0d to c9bbbf0 Compare May 18, 2026 20:55
@jrob5756 jrob5756 merged commit 752a9b5 into main May 18, 2026
9 checks passed
@jrob5756 jrob5756 deleted the fix/195-web-bg-windows-job-breakaway branch May 18, 2026 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--web-bg child process dies when launched from a job-managed shell on Windows (missing CREATE_BREAKAWAY_FROM_JOB)

1 participant